Modulating one or more parameters of an audio or video perceptual coding system in response to supplemental information

ABSTRACT

A method of modifying the operation of the encoder function and/or the decoder function of a perceptual coding system in accordance with supplemental information, such as a watermark, so that the supplemental information may be detectable in the output of the decoder function. One or more parameters are modulated in the encoder function and/or the decoder function in response to the supplemental in formation.

TECHNICAL FIELD

[0001] The invention relates to steganography in the context of audio orvideo signals. More particularly, the invention relates to modifying theoperation of the encoder and/or the decoder of an audio or videoperceptual coding system in accordance with supplemental information sothat the supplemental information may be detectable in the output of thedecoder. Such supplemental information is often referred to as a“watermark”. Watermarking is an aspect of steganography.

BACKGROUND ART Steganography and Watermarking

[0002] Steganography is the science of hiding a signal within anothersignal. Steganographic algorithms or processes may be robust or“fragile”—that is, it may be very difficult or very easy to corrupt thehidden signal. Considering audio applications, one very fragilesteganographic technique is to use the least significant bit of a PCMchannel to carry a data stream independent from the audio programcontent, which would be carried in the upper bits. The hidden datachannel carried in the least significant bit does not significantlydistort the audio program, but rather acts as a low-level dither signal.This technique is fragile in the sense that simple audio processing candestroy the data signal, such as gain changes, digital-to-analogconversion, etc.

[0003] Watermarking is a form of steganography in which, typically, thesignal hiding technique is intended to be robust against corruption byeither normal processing or deliberate attack. As such, watermarks arevaluable in applications related to security, such as copy protection oridentification of content ownership. In such applications, the watermarkmay carry, for example, copy control status, copyright information, andinformation related to how the main program material was released. Evenif the main program is subsequently stolen or illegally copied, ideally,the watermark remains embedded within the program material and providesa way to establish proof of ownership.

[0004] One or more watermarks may be inserted at many points along a“content” (e.g., audio or video performance) distribution path.Information added to the signal at the beginning of this path maycontain copyright information or the mastering location whileinformation added at the end of the signal chain may contain playbackinformation, such as date/time stamps and/or machine serial number. Forcontent to be traced to its origin, watermarks may be embedded atvarious locations along the distribution path.

[0005] One important consideration for watermarking of audio and videosignals is that the hidden signal should not unnecessarily degrade thequality of the signal in which it is hidden. Ideally, the watermarkshould be completely transparent; that is, the difference between thewatermarked signal and the original signal should be imperceptible (toan unaided human observer). Of course, the difference must be detectableby some means, as otherwise the watermark signal is unrecoverable.However, watermarks may be intentionally perceptible for someapplications. For example, images may be visibly watermarked in order toprevent commercial use. In addition, paper may be watermarked in orderto convey a perceptible seal of authenticity.

[0006] Thus, the goals of watermarking may be summarized as follows:

[0007] modification of a primary signal in such a way as to add asecondary signal or supplemental information, resulting in a modifiedprimary signal,

[0008] the difference between the original and the modified primarysignal should be detectable but imperceptible, and

[0009] the modification should be difficult to remove or obscure.

Perceptual Coding

[0010] Perceptual coding is the science of removing perceptualirrelevancies from signals in order to reduce them to a more efficientform of expression. For example, in some applications, perceptual codingis used to reduce the transmission data rate of digital audio or videosignals in order to meet a predetermined channel capacity constraint.Perceptual coding of audio and video signals is a well-establisheddiscipline, enabling audio and video signals to be reduced to relativelylow data rates for efficient storage and transmission.

[0011] Many perceptual coders operate by analyzing the content of theoriginal signal and identifying the perceptual relevance of each signalcomponent. A modified version of the original signal is then created,such that the modified version may be expressed using a lower data ratethan the original signal. Ideally, the difference between the originaland modified signals is imperceptible. Noise, usually quantizing noise,or other distortion is controllably introduced in order to reduce thedata rate of the signal. Properties of human perception are taken intoaccount to manipulate the noise or other distortion so that it remainsimperceptible or minimally perceptible.

[0012] Perceptual coders employ a masking model intended to reflecthuman perception to some degree of accuracy. The masking model providesa perceptual masking threshold that establishes a boundary forperceptibility. The solid line in FIG. 1 shows the sound pressure levelat which sound, such as a sine wave or a narrow band of noise, is justaudible, that is, the threshold of hearing. Sounds at levels above thecurve are audible; those below it are not. This threshold is clearlyvery dependent on frequency. One is able to hear a much softer sound atsay 4 kHz than at 50 Hz or 15 kHz. At 25 kHz, the threshold is off thescale—no matter how loud it is, one cannot hear it.

[0013] Consider the threshold, as shown by the dashed line in FIG. 1, inthe presence of a relatively loud signal at one frequency, say a 500 Hzsine wave, shown as the vertical line in the figure. The threshold risesdramatically in the immediate neighborhood of 500 Hz, modestly somewhatfurther away in frequency, and not at all at remote parts of the audiblerange.

[0014] This rise in the threshold is called masking. In the presence ofthe loud 500 Hz sine wave signal (the “masking signal” or “masker”),signals under this threshold, which may be referred to as the “maskingthreshold”, are hidden, or masked, by the loud signal. Further away,other signals can rise somewhat in level above the no-signal threshold,yet still be below the new masked threshold and thus be inaudible.However, in remote parts of the spectrum in which the no-signalthreshold is unchanged, any noise that was audible without the 500 Hzmasker remains just as audible with it. Thus, masking is not dependentupon the mere presence of one or more masking signals; it depends uponwhere they are spectrally. Some musical passages, for example, containmany spectral components distributed across the audible frequency range,and therefore give a masked threshold curate that is raised everywhererelative to the no-signal threshold curve. Other musical passages, forexample, consist of relatively loud sounds from a solo instrument havingspectral components confined to a small part of the spectrum, thusgiving a masked curve more like the sine-wave masker example of FIG. 1.

[0015] Masking also has a temporal aspect that depends on the timerelationship between the masker(s) and the masked signal(s). Somemasking signals provide masking essentially only while the maskingsignal is present (“simultaneous masking”). Other masking signalsprovide masking not only while the masker occurs but also earlier intime (“backward masking” or “premasking”) and later in time (“forwardmasking” or “postmasking”). A “transient”, a sudden, brief andsignificant increase in signal level, may exhibit all three “types” ofmasking: backward masking, simultaneous masking, and forward masking,whereas, a steady state or quasi-steady-state signal may exhibit onlysimultaneous masking.

[0016] All noise and distortion that is added by the perceptual codingprocess should remain below the masking threshold in order to avoidperceptible impairments. If the noise or distortion added by the codingprocess reaches, but does not exceed, the masking threshold, the signalis said to be coded at the level of “just noticeable difference”. The“coding margin” of a system may be defined as the amount by which theadded noise or distortion lies beneath the masking threshold—a codingmargin of zero means that the signal is coded at the level of justnoticeable difference, while a positive coding margin means that theadded noise or distortion is imperceptible with some room to spare, anda negative coding margin means that perceptible impairments are present.

[0017] Note that different aspects of the signal (e.g., bandwidth, timeresolution, spatial accuracy, etc.) may be coded to different degrees ofaccuracy, resulting in different coding margins for different signalcharacteristics. If a source signal is coded such that the coding marginis non-negative for all characteristics of the signal, it may be said tobe perceptually equivalent to the source.

[0018] A perceptual coding system consists of an encoder that maycommunicate bit allocation information or perceptual model informationalong with coded data to a decoder. There are three main types ofperceptual coding systems: forward adaptive, backward adaptive, and ahybrid of the two. In a forward adaptive system, the encoder explicitlysends bit allocation information to the decoder. A backward adaptivesystem does not send any bit allocation or perceptual model informationto the decoder. The decoder recreates the bit allocation from the codeddata. A hybrid system allows for some allocation information, such as aless than full resolution form of the perceptual model, to be includedwith the coded data, but much less than in a full forward adaptivesystem. A more detailed discussion of these three types of perceptualcoding systems is set forth in “AC-3: Flexible Perceptual Coding forAudio Transmission and. Storage,” by Craig C. Todd et al, Preprint 3796,96th Convention of the Audio Engineering Society, Feb. 26-Mar. 1, 1994.Perceptual coding systems developed by Dolby Laboratories, such as theDolby Digital and Dolby E coding systems, identified further below, areexamples of hybrid forward/backward adaptive systems, while the MPEG-2AAC coding system, also identified further below, is an example of aforward adaptive system.

[0019] The goals of perpetual coders may be summarized as follows:

[0020] modification of a primary signal resulting in a modified signal,

[0021] the difference between the original and the modified signalshould be imperceptible, and

[0022] representation of the modified signal should be more efficientthan representation of the original signal.

Security

[0023] Watermarking as a security measure is only as strong as theability of the watermark to survive a direct attack. Many watermarkingtechniques currently in use attempt to shield themselves from successfulattack by keeping the details of the watermark a secret, under thepresumption that if the watermark is not publicly known, attackers willnot know how to modify the watermarked signal to obscure the watermarkdata. This is a principle known as “security through obscurity.” In thefield of cryptography, security through obscurity is generally dismissedas an illogical principle. If an algorithm or process derives itssecurity through secrecy, it only takes one person to disclose thedetails of the technique for the security of the entire system to becompromised.

[0024] The goals of security may be summarized as follows:

[0025] protect content in such a way that theft of the content is eitherunusable or enable subsequent proof of piracy and traceability to thesource of the piracy,

[0026] be robust against attacks, and

[0027] maintain high security at even the weakest link in the system.

DISCLOSURE OF INVENTION

[0028] The present invention is directed to a method of modifying theoperation of the encoder and/or the decoder of a perceptual codingsystem in response to supplemental information so that the supplementalinformation may be detectable in the output of the decoder. One or moreparameters in the encoder and/or the decoder are modulated in responseto the supplemental information.

[0029] In accordance with the present invention, supplementalinformation, such as watermark information, is conveyed by modulatingone or more parameters in the encoder and/or the decoder of a perceptualcoding system in order to cause a detectable, but preferablyimperceptible, change in the output of the decoder. This information is“supplemental” in that it is in addition to the primary information,such as audio or video information, carried by the coding system.Typically, such supplemental information is in the nature of a“watermark”, although it need not be. Modulation of one or moreparameters may be said to “embed” the supplemental or watermarkinformation in the encoded signal (in the case of modulating parametersin the perceptual encoder) and in the decoded signal (in the case ofmodulating parameters in the perceptual encoder and/or the perceptualdecoder).

[0030] Although certain implementations of the invention, whenimplemented at least partly in an encoder, may indirectly modifybitstream data representing the primary information, the invention doesnot contemplate the direct modification of bitstream data representingprimary information (nor the modification of the primary informationthat becomes bitstream data after quantization in the perceptualencoder). The invention contemplates detection of the supplementalinformation in the perceptual decoder output (whether such informationis conveyed as the result of actions in the encoder and/or the decoder)rather than in the undecoded bitstream.

[0031] By “modulating” we mean varying the value of a parameter betweenor among one or more values (states), wherein said values may include a“default value”, which value the parameter otherwise would have beenwere it not for the action of the present invention. For example, theparameter value may be varied between or among its default value and oneor more other values (in the case of a parameter having only twopossible values, such a parameter sometimes referred to as a “flag”, theparameter may be varied between those two values), or it may be variedbetween or among one or more other values, which values do not includethe default value.

[0032] By “modulating in response to” supplemental information or awatermark signal or sequence we mean that the modulation of a parameteris controlled by the supplemental information or watermark signal orsequence either directly or indirectly such as when the control ismodified by a function of one or more other signals, the signalsincluding, for example, a set of instruction such as a deterministicsequence or the input signal applied to the coding system.

[0033] By “parameter” we mean a variable within a perceptual codingsystem that is not bitstream data representing primary information.Examples of Dolby Digital (AC-3), MPEG audio, and MPEG video parametersthat are suitable for modulating in accordance with aspects of thepresent invention are shown below in the tables of FIGS. 6, 7 and 8,respectively. The invention also contemplates the modulation of one ormore parameters that are not recognized in published perceptual coderstandards, including parameters yet to be defined.

[0034] By “bitstream data representing primary information” we mean databits in the encoded bitstream, generated by the perceptual encoder butnot yet decoded, that carry the primary information, such as audio orvideo information. Bitstream data representing primary informationincludes, for example, in the case of a Dolby Digital (AC-3) system,exponents and mantissas, and, in the case of an MPEG-2 AAC system, scalefactors and Huffman encoded coefficients.

[0035] In complex perceptual coding systems (e.g., Dolby Digital andDolby E audio, MPEG audio, MPEG video, etc.), a large number ofindependent coding parameters provide a significant degree of codingflexibility. “Dolby”, “Dolby Digital” and Dolby E” are trademarks ofDolby Laboratories Licensing Corporation.

[0036] Details of Dolby Digital coding are set forth in “Digital AudioCompression Standard (AC-3),” Advanced Television Systems Committee(ATSC), Document A/52, Dec. 20, 1995 (available on the World Wide Web ofthe Internet at www.atsc.org/Standards/A52/a_(—)52.doc.) See also theErrata Sheet of Jul. 22, 1999 (available on the World Wide Web of theInternet at www.dolby.com/tech/ATSC_err.pdf).

[0037] Details of Dolby E coding are set forth in “Efficient BitAllocation, Quantization, and Coding in an Audio Distribution System”,AES Preprint 5068, 107th AES Conference, August 1999 and “ProfessionalAudio Coder Optimized for Use with Video”, AES Preprint 5033, 107th AESConference August 1999.

[0038] Details of MPEG-2 AAC coding are set forth in ISO/IEC13818-7:1997(E) “Information technology—Generic coding of movingpictures and associated audio information—, Part 7: Advanced AudioCoding (AAC),” International Standards Organization (April 1997); “MP3and AAC Explained” by Karlheinz Brandenburg, AES 17th InternationalConference on High Quality Audio Coding, August 1999; and “ISO/IECMPEG-2 Advanced Audio Coding” by Bosi, et. al., AES preprint 4382, 101stAES Convention, October 1996.

[0039] An overview of various perceptual coders, including Dolbyencoders, MPEG encoders, and others is set forth in “Overview of MPEGAudio: Current and Future Standards for Low-Bit-Rate Audio Coding,” byKarlheinz Brandenburg and Marina Bosi, J. Audio Eng. Soc., Vol. 45, No.1/2, January/February 1997.

[0040] Specific default values for perceptual coding parameters aregenerally chosen by the coding system based on the characteristics ofthe input signal. However, there is usually more than one way to selectcoding parameter values that produce decoded signals having noperceptible differences and such variations in coding parameter valuesmay result in decoded signals with detectable, yet imperceptible,differences. Note that imperceptibility refers to human perceptionwhereas detectability is based on the capabilities of a non-humandetector.

[0041] A supplemental signal or watermark detector recovers the embeddedinformation contained within the reproduced (decoded) signal. In thecase of audio signals, for example, the detection may be accomplishedacoustically in some cases, while electronic detection may be requiredin other cases. Electronic detection may be in the digital or analogdomains. Electronic detection in the digital domain may be in the timeor frequency domain of the decoded output or may be in the frequencydomain within the decoder prior to frequency to time conversion.Extracting the watermark after acoustic processing is considered a moredifficult challenge because of the addition of room noise, speaker andmicrophone characteristics, and overall playback volume.

[0042] Many practical perceptual coding systems do not meet therequirement of keeping added noise beneath the level of just noticeabledifference. Perceptibility requirements in perceptual coding systems areoften relaxed to meet bit-rate targets or complexity limits. In thesecases, although noise added during perceptual coding may be perceptible,there likely will be values other than default values to which codingparameters may be modulated that will not render any more perceptiblethe already perceptible noise. Although the modulation of a parametermay result in substantially no perceptible change in perceived noise,nevertheless, it may result in a detectible change in the decodedsignal.

[0043] Preferably, in accordance with aspects of the present invention,one or more parameters are modulated so that the effects of themodulation cause the noise and distortion added by perceptual coding tobe close to, but below, the level of just noticeable difference in allor part of the frequency spectrum (“distortion”, in this sense, is thedifference between the coded and original signals, and may or may notresult in audible artifacts). Therefore, it would be difficult to removeor obscure the resulting effects of modulating one or more parameterswithout exceeding the masking threshold and creating a perceptibleimpairment. On the other hand, if an attack were below the maskingthreshold, then part of the effects of parameter modulation likely willremain.

[0044] As suggested above, aspects of the present invention may also beemployed when the encoder does not encode the primary source signal sothat noise and distortion are below the level of just noticeabledifference. In this case, the source signal is encoded in such a waythat it is impaired relative to the source, and the parameter modulationintroduces impairments in the decoded signal that are different from adetection standpoint, but, preferably, are substantially the sameperceptibly. As in the previous case, it would be difficult to remove orobscure the resulting effects of the parameter modulation in the decodedsignal without exaggerating the impairment or introducing additionalimpairments with a greater degree of perceptibility.

[0045] The approach of the present invention is fundamentally differentfrom techniques that apply a watermark prior to perceptual encoding. Inthose techniques, even though the coding system may contain enoughcoding margin to convey a watermark, there is no guarantee that theparticular method chosen to convey the a priori watermark coincides withthe location of the perceptual coding system's coding margin. Becausesuch prior systems operate independently, they may occasionally interactbadly, introducing perceptible impairments or causing the watermark tobe obscured.

[0046] As mentioned above, perceptual encoders reduce the data rate ofan input signal by removing perceptually redundant information. Forexample, a constant data rate encoder reduces a fixed rate of inputinformation to a lower fixed rate of information. Part of this datareduction requires a function sometimes characterized as a “ratecontrol” that ensures that the encoder output does not exceed the finalfixed information size. The rate control reduces information until ithas achieved the final encoded size.

[0047] In some perceptual encoders, a distortion measurement is pairedwith the rate control to ensure that the correct information isdiscarded. A distortion measurement compares the original input signalwith the encoded signal (output of the rate control). The distortionmeasure may be used to control coding parameters to change the outcomeof the rate control process.

[0048] The distortion rate control aspect of the present invention seeksto solve the problem of how to embed a watermark in a perceptual encoderwhile maximizing the strength and minimizing the perceptibility of theembedded signal. In one embodiment, the present invention also allows auser to choose the strength, or energy, of the embedded signal byadjusting a parameter in the watermarking embedding process.

[0049] In addition to parameter modulation, aspects of the presentinvention employ a set of instructions such as a deterministic sequenceto vary certain aspects of the parameter modulation and, hence,characteristics of the resulting watermark. Deterministic sequences aregenerated by mathematical processes that produce sequences of binaryones and zeros computed given a defining equation (the generatorequation) and an initial state (the key). A number of alternativeaspects of the invention employing deterministic sequences aredisclosed. These techniques may improve the imperceptibility of thewatermark and also may improve the robustness of the watermark, which isan interesting and useful result inasmuch as many other techniques thatimprove imperceptibility tend to degrade robustness. Finally, thesetechniques may improve security, in the sense that it becomes possibleto reveal all aspects of the watermarking system (except for thedeterministic sequence key) without sacrificing the robustness of thesystem.

[0050] Deterministic sequence aspects of the present invention mayinclude one or more of the following acts:

[0051] Using a deterministic sequence to modify the rate of parametermodulation transitions and, consequently, the watermark symboltransition rate (see Table I, below),

[0052] Using a deterministic sequence to select the parameter(s) formodulation (see Table 2, below), and

[0053] Using a deterministic sequence to modify the rate at which thechoice of parameters for modulation changes (see Table 3, below).

[0054] In addition, alternative aspects of the present invention includeacts of using characteristics of the source signal to control parametermodulation and/or choice of parameters for modulation.Source-signal-responsive aspects of the present invention may includeone or more of the following acts:

[0055] Using characteristics of the source signal to variably modify theparameter modulation rate and, consequently, the watermark symboltransition rate (see part a of Table 4, below),

[0056] Using characteristics of the source signal to variably modify therate at which the choice of parameters for modulation changes (see partb of Table 4, below), and

[0057] Using characteristics of the source signal to variably modify thenumber of parameters in the available set of parameters for modulation(see Table 5, below).

[0058] As explained further below, both a deterministic sequence andcharacteristics of the source signal may be used in connection withmodulating parameters according to alternative aspects of the presentinvention. See Tables 6, 7 and 8, below.

[0059] For some implementations of the invention, watermark detection inthe output of the perceptual decoder is likely to require access to theprimary information applied to the encoder. For some otherimplementations of the invention, watermark detection may be performedwithout having access to the original primary information at the expenseof greater complexity in the detection.

[0060] It is often desirable to apply a unique, or “serialized” (e.g., aserial number) watermark at the point where signals are delivered to anaudience. In accordance with aspects of the present invention,supplemental information or a watermark is embedded during theperceptual decoding process. One or more parameters are modulated in thedecoder prior to inverse quantization.

[0061] Imperceptibility may be maintained if the noise or distortionadded by the decoder parameter modulation process does not exceed aperceptual threshold. In order to embed a watermark imperceptibly aspart of thee decoding process, a perceptual threshold is used. Manyperceptual coders transmit perceptual models from the encoding processto the decoding process in some form or another; however, other codersprovide only approximations or coarse representations of the perceptualthreshold. The most accurate perceptual threshold is derived from theunquantized, source spectral coefficients, but the data rate increase issignificant if such data is transmitted to the decoder. Alternatively,the perceptual threshold provided to the decoder in a perceptual codingsystem may be an exponent of a mantissa in which the exponent representsthe information sample having the maximum energy within a critical band(as in the Dolby Digital system). In order to improve the accuracy ofthe perceptual threshold in the decoder, exponents may be transmittedfrom the encoder that are based on an average of sample energy in a bandinstead of the maximum energy in the band.

[0062] Although modulating parameters in the decoder is similar tomodulating parameters in the encoder in many respects, there is lessflexibility. For example, modulating one or more parameters in adecoding system may require that care be taken when reformulating thebit allocation information based on the coding parameters. Furthermore,it is more difficult to render imperceptible the effects of parametermodulation in the decoder. One reason for this is that, at least in thecase of an ideal encoder, the encoding process has already addedquantization error up to the threshold of perceptibility. However, thisis not always the case, as coding margin may exist, for example, due toimperfections in the perceptual model, a positive signal-to-noise ratiooffset, or signal conditions.

BRIEF DESCRIPTION OF DRAWINGS

[0063]FIG. 1 is an idealized plot showing (solid line) the soundpressure level at which sound is just audible (the threshold of hearing)when no masking signals are present and showing (dashed line) thethreshold of hearing in the presence of a 500 Hz sine wave.

[0064]FIG. 2 is a functional block diagram illustrating the basicprinciples of the present invention in which supplemental informationmodulates one or more parameters of a perceptual encoder function and/ora perceptual decoder function in a perceptual coding system.

[0065]FIG. 3A is a functional block diagram illustrating an aspect ofthe present invention that includes a supplemental information detectorfunction receiving the output of the coding system.

[0066]FIG. 3B is a functional block diagram illustrating with moredetail of the detector function, the aspect of the present inventionthat includes a supplemental information detector function receiving theoutput of the coding system.

[0067]FIG. 4 is a functional block diagram illustrating an aspect of thepresent invention that includes a supplemental information detectorfunction receiving both the output of the coding system and the input tothe coding system.

[0068]FIG. 5. is a functional block diagram illustrating an aspect ofthe present invention in which the supplemental information detectorfunction includes not only a comparator function, but also a perceptualencoder function and a perceptual decoder function, neither of which hasits parameters modulated.

[0069]FIG. 6 is a table showing parameters suitable for modulation incertain perceptual audio coders of the hybrid forward/backward adaptivetype.

[0070]FIG. 7 is a table showing parameters suitable for modulation incertain perceptual audio coders of the forward adaptive type.

[0071]FIG. 8 is a table showing parameters suitable for modulation incertain perceptual video coders.

[0072]FIG. 9 is a schematic representation of certain parameters thatspectrally model the human ear's masking curve (spectral masking modelparameters) in certain perceptual audio coders.

[0073]FIG. 10 is a schematic representation of the spectral maskingmodel parameters capable of being modulated in a class of perceptualaudio coders.

[0074]FIG. 11A is an idealized representation showing the modulation ofthe SNR offset parameter (a masking threshold parameter) in the presenceof a sine wave signal in certain perceptual audio coders.

[0075]FIG. 11B is an idealized representation showing the effect in theoutput of the perceptual decoder when the SNR offset parameter ismodulated in the manner shown in FIG. 11A for the case of abit-constrained coding system.

[0076]FIG. 11C is an idealized representation showing the effect in theoutput of the perceptual decoder when the SNR offset parameter ismodulated in the manner shown in FIG. 11A for the case of a codingsystem that is not bit constrained.

[0077]FIG. 11D shows the legends employed in FIGS. 11A-C and 12A-C.

[0078]FIG. 12A is an idealized representation showing the modulation ofthe fast gain code parameter (a masking threshold parameter) in thepresence of a sine wave signal in certain perceptual audio coders.

[0079]FIG. 12B is an idealized representation showing the effect in theoutput of the perceptual decoder when the fast gain code parameter ismodulated in the manner shown in FIG. 12A for the case of abit-constrained coding system.

[0080]FIG. 12C is an idealized representation showing the effect in theoutput of the perceptual decoder when the fast gain code parameter ismodulated in the manner shown in FIG. 12A for the case of a codingsystem that is not bit constrained.

[0081]FIG. 13 is an idealized representation showing the effects, incertain perceptual audio coders, of modulating parameters other thanmasking parameters in certain perceptual audio coders, namely, the“coupling in use” flag, the rematrixing in use flag and the couplingbegin frequency code.

[0082]FIG. 14 is an idealized representation showing the effects, incertain perceptual audio coders, of modulating a parameter other than amasking parameter, namely, the phase flag.

[0083]FIG. 15 is a series of idealized waveforms showing time-domainalias window shapes for embedding supplemental information duringencoding.

[0084]FIG. 16 is a series of idealized waveforms showing time-domainalias window shapes for embedding supplemental information duringdecoding.

[0085]FIG. 17 is an idealized temporal envelope response, plotting soundpressure level (SPL) versus time illustrating the temporal maskingeffects of a masking signal.

[0086]FIG. 18 is an idealized representation showing the type ofmodulations that can be applied to a signal such that the effects areconstrained within a temporal masking envelope.

[0087]FIG. 19 is a series of idealized amplitude versus frequency plotsillustrating how a 2-bit symbol may be represented by four differentbandwidths.

[0088]FIG. 20 is an idealized frequency versus time plot showing anexample of an audio signal that contains an embedded signal using thebandwidth of the signal to represent different symbols.

[0089]FIG. 21 is an idealized amplitude versus frequency plotillustrating the addition of noise shaped to the approximate level ofthe human hearing threshold in the presence of a sine wave signal.

[0090]FIG. 22 is an idealized energy versus frequency plot showing threedifferent energy levels required for detecting four different bandwidthsthat create a 2-bit symbol.

[0091]FIG. 23 is an idealized amplitude versus energy plot showingseveral example histograms of the distribution of ‘high’ and ‘low’states.

[0092] FIGS. 24-26 are logic flow diagrams showing a process forembedding a watermark using a threshold of perceptibility.

[0093]FIG. 24 is a logic flow diagram showing the inner iteration loopportion of the process for embedding a watermark using a threshold ofperceptibility.

[0094]FIG. 25 is a logic flow diagram showing the outer iteration loopportion of the process for embedding a watermark using a threshold ofperceptibility, in which outer loop spectral coefficients are amplified.

[0095]FIG. 26 is a logic flow diagram showing the modification of theprocess of FIG. 25 to fulfill the psychoacoustic model, or perceptualthreshold, as much as possible while also embedding the supplementalinformation or watermark signal.

[0096]FIG. 27 shows a series of idealized waveforms depicting, across afrequency spectrum, the perceptual threshold, quantizer error andmodified quantizer error, illustrating how a watermark may be embeddedusing a distortion measuring process for the case of modulating aparameter that affects quantizer error within a critical band.

[0097]FIG. 28 shows a series of idealized waveforms depicting, across afrequency spectrum, the perceptual threshold, quantizer error andmodified quantizer error, illustrating how a watermark may be embeddedusing a distortion measuring process for the case of modulating aparameter that affects signal to noise ratio offset throughout thefrequency spectrum.

[0098]FIG. 29 is a logic flow diagram, illustrating the steps of aprocess of embedding a watermark during decoding, in accordance withaspects of the present invention.

[0099]FIG. 30 is a functional block diagram showing other aspects of theinvention in which control of the modulation by the supplementalinformation of watermark is modified by a function of one or more othersignals or data sequences including, for example, a deterministicsequence and/or the input signal applied to the coding system.

BEST MODE FOR CARRYING OUT THE INVENTION

[0100]FIG. 2 is a functional block diagram illustrating the basicprinciples of the present invention. A perceptual encoder function 2 anda perceptual decoder function 4 comprise a perceptual coding system.Primary information, such as audio or video information, is applied tothe perceptual encoder function 2. The encoder function 2 generates adigital bitstream that is received by the perceptual decoder function 4.One or more parameters in the encoder function and/or the decoderfunction are modulated in response to supplemental information (e.g., awatermark signal or sequence). Because supplemental information may beapplied either to the encoder function or to the decoder function or toboth, dashed lines are shown from the supplemental information to theencoder function and to the decoder function, respectively. The outputof the perceptual decoder function is primary information with embeddedsupplemental information. The supplemental information may be detectablein the decoder function output.

[0101] If supplemental information is applied to both the encoderfunction 2 and the decoder function 4, typically, the informationapplied to one will be different from that applied to the other. Forexample, the supplemental information controlling the one or moreencoder function parameters might be a watermark identifying the ownerof the audio or video content and the supplemental informationcontrolling the one or more decoder function parameters might be aserial number identifying the equipment that presents the audio or videocontent to one or more consumers. Typically, the supplementalinformation would be applied to the encoder function and the decoderfunction at different times.

[0102] FIGS. 3-5 are functional block diagrams illustrating the basicprinciples of an aspect of the present invention that includes adetector function for detecting the supplemental information in theoutput of the decoder function. Detection may be accomplished in thedigital domain or the analog domain (electrical or acoustical) of thedecoder function output. Detection may also be accomplished in thedigital domain of decoder function after decoding but prior to thefrequency domain to time domain conversion.

[0103]FIG. 3A is the same as FIG. 2 except that it includes a detectorfunction 6 receiving the output of the decoder function 4 that detectsthe supplemental information in the output of the decoder function. Theoutput of detector function 6 is the supplemental information. FIG. 4 isthe same as FIG. 3 except that it includes a detector function 8receiving not only the output of the decoder function 4 but also thesame primary information applied to the encoder function. The essentialfunction of the detector function 8 is to compare the original inputinformation applied to the encoder function with the output of thedecoder function in order to provide as its output the supplementalinformation. FIG. 5 is a variation of the FIG. 4 arrangement. In FIG. 5,as in FIG. 4, a detector function 10 receives the output of decoderfunction 4 and the primary information applied to the encoder function2. However, detector function 10 differs from detector function 8 andincludes not only a comparator function 12, but also a perceptualencoder function 14 and a perceptual decoder function 16. Encoderfunction 14 is the same as encoder function 2 except that its parametersare not modulated. Decoder function 16 is the same as decoder function 4except that its parameters are not modulated. Thus the act of detectingthe supplemental information in the output of the decoder isaccomplished by one of the following acts:

[0104] observing the decoded signal,

[0105] comparing the decoded signal to the signal applied to the encoderfunction, and

[0106] comparing the decoded signal to the decoded signal from asubstantially identical perceptual coding system in which no parametersin the encoder function or decoder function are modulated in response tosupplemental information.

[0107] The detection arrangement of FIG. 3A is most suitable fordetecting the effects of certain types of parameter modulation, such aswhen a bandwidth parameter is modulated (modulating bandwidth parametersis described in detail below). In order to detect the effects ofmodulating most parameters, it is necessary to compare the primaryinformation applied to the encoder with the primary information carryingembedded supplemental information provided by the decoder as in thearrangements of FIGS. 4 and 5. The FIG. 5 arrangement makes it possibleto do a more rigorous comparison because the only differences betweenthe compared information will be those caused by the modulationparameters. In the FIG. 4 arrangement, the differences include othereffects that may be introduced by the perceptual encoding and decodingprocesses.

[0108] Because the detection arrangement of FIG. 3A does not requireaccess to the primary information applied to the perceptual encoder, itmay be accomplished in real time or near real time, depending on whichencoder and/or decoder parameters are modulated. For example, modulatinga bandwidth parameter may allow detection by analyzing only the decoderoutput in real time or near real time. Particularly, detector function 6of the FIG. 3A arrangement may include one or more delay functions sothat the output of the decoder function 4 may be compared againstitself. For example, as shown in FIG. 3B, the detector function 6 mayinclude a comparator function 12′ and one or more delay functions 7, 7′,etc. so that the act of observing the decoded signal comprises comparingthe decoded signal to a time delayed version of itself. Energy statesfrom one or more previous blocks are subjected to a comparator functionthat uses a threshold to determine the symbol, in the manner, forexample, of the bandwidth modulation detection described below. Theblock lengths are known by the detector and some form of synchronizationmust occur in order to align the expected symbol rate with the actualsymbol rate. Modulation of other parameters may not allow detection inreal time or near real time or may require comparing the decoder outputto the encoder input signal as in the arrangements of FIGS. 4 and 5.

[0109] In arrangements such as those of FIGS. 4 and 5 in which thedecoder output is compared to the encoder input, it is important tosynchronize the input and output signals. Depending on which parameteror parameters are chosen for modulation and on the supplementalinformation data rate, it may be necessary to provide a high degree ofsynchronization between those signals. One way to do so is to embed adeterministic sequence, such as a PRN sequence in the primary signal sothat the sequence is also embedded in the decoder output. By comparingthe sequence in the input and output signals a fine-grainedsynchronization is possible.

[0110] Detection may be accomplished manually or, in some cases,automatically. Use of a PRN sequence in the primary signal mayfacilitate automatic detection. If done manually, visual aids such as aspectral analysis of compared signals may be employed.

[0111] Some examples of the coding parameters that may be modulated toembed a watermark are set forth in several tables: a first table shownin FIG. 6 (Dolby audio coder parameters), a second table, shown in FIG.7 (MPEG audio coder parameters), and a third table, shown in FIG. 8(MPEG video coder parameters). For each category of parameter (e.g.,“Masking model and bit allocation”), the respective table indicates thetype of parameter (e.g., “SNR offset”), the specific parameter(s) (e.g.,“csnroffst”, “fsnroffst”, etc.), if the parameter(s) is (are)susceptible to modulation in the encoder and/or in the decoder, and theresulting change in signal characteristics of the watermark in thedecoded signal when the parameter(s) is (are) modulated. In the firstcolumn of the table shown in FIG. 6, there are six categories ofparameters addressed: masking model and bit allocation, coupling betweenor among channels, frequency bandwidth, dither control, phaserelationship, and time/frequency transform window. Note that in thefirst table, rematrixing can only be performed during decoding ifrematflg is “0” (no rematrixing in the encoder) and in the second table,M/S coding can only be performed during decoding if ms_(—used is “)0”(no M/S coding in the encoder).

[0112] Where a type of parameter has one or more parameters in a codingsystem, recognized abbreviations for the respective parameters are shownin parentheses. Thus, for example, the “SNR offset” type of parameterincludes four parameters in Dolby Digital: “csnroffst” (coarse SNRoffset), “fsnroffst” (channel fine SNR offset), “cplfsnroffst” (couplingfine SNR offset), and “lfesfsnroffst” (low frequency effects channelfine SNR offset). These and other Dolby Digital coding parameters areexplained further in the A/52 Document cited above. While most of thelisted Dolby audio coder parameters are common to the Dolby Digital andDolby E coding systems and, thus, are explained in the A/52 Document, afew are unique to the Dolby E coding system (e.g., Back gain code(backgain) and Back decay code (backleak)). Further information aboutbackgain and backleak are provided below

[0113] In the first column of the table shown in FIG. 7, there are fourcategories of parameters addressed: masking model and bit allocation,coupling between or among channels, temporal noise shaping filtercoefficients, and time/frequency transform window. Likewise, in thefirst column of the table shown in FIG. 8, there are two categories ofparameters addressed: frame type and motion control. Further informationabout listed MPEG audio coder and video coder parameters is set forth inthe above-cited ISO/IEC document, MPEG-2 AAC papers, and in otherpublished MPEG documents. Aspects of the present invention areapplicable not only to Dolby and MPEG perceptual coding systems, butalso to other perceptual coding systems in which parameters in theencoder and/or decoder may be modulated. Examples of other perceptualcoders are discussed in the above-referenced journal article byBrandenburg and Bosi (J. Audio Eng. Soc., 1997).

Modulating Perceptual Hearing Model Parameters

[0114] In perceptual audio coding systems, such as Dolby Digital andDolby E, there are parameters that represent the perceptual hearingmodel or masking model and are used in the bit allocation process. Inparticular, certain parameters spectrally model the human ear's maskingcurve: a downwards masking curve steeply decaying with respect tofrequency, an upwards masking curve steeply decaying with respect tofrequency, and an upwards masking curve gradually decaying with respectto frequency. These are shown schematically in FIG. 9. Although spectralmasking is a frequency domain concept, the standard nomenclaturerelating to these masking parameters employs time domain terminology(“slow” and “fast”, for example).

[0115] Referring to FIG. 9, the coding parameter elements thatcorrespond to the spectral masking model are defined by their level andslope (gain and leak, respectively) with respect to a masking signal asfollows:

[0116] Downward masking curve: backgain/backleak.

[0117] Upwards masking curve (fast): fastgain/fastleak.

[0118] Upwards masking curve (slow): slowgain/slowleak.

[0119] Note that backgain and backleak are parameters specified in DolbyE coding, but are not parameters specified in Dolby Digital coding. InDolby Digital, as described in the above-cited A/52 document, thefastgain parameters are the fast gain codes (fgaincod, cplfgaincod andlfegaincod); the fastleak parameters are the fast decay codes (fdcycodand cplfleak); the slowgain parameter is the slow gain code (sgaincod);and the slowleak parameters are the slow delay codes (sdycod andclpsleak).

[0120] Each of the parameters defined above is suitable for modulationin order to convey a watermark during perceptual coding. The modulationof any one of them slightly alters the spectral masking model and thusinfluences the bit allocation process. Thus, the masking modelparameters are tightly coupled with the primary input signal so as tomake the watermark robust. FIG. 10 provides an illustration of theparameters of the spectral masking model capable of being modulated.

[0121] Certain other parameters in the Dolby Digital and Dolby E codingsystems control the overall signal-to-noise ratio (SNR). In DolbyDigital these parameters are the SNR offset parameters: csnroffst,fsnroffst, cplfsnroffst, and lfesfsnroffst. The SNR parameters exist tomaintain a desired minimum level of signal-to-noise headroom between thesignal and the quantization noise. These parameters affect the entirespectrum uniformly, unlike the spectral masking model parameters thatprimarily affect only a portion of the spectrum relative to a maskingsignal.

[0122] Yet other parameters act as a fine SNR adjustment on a criticalband basis, termed “banded SNR”, or delta bit allocation: namely, deltbaand cpldeltba in Dolby Digital coding.

[0123]FIGS. 11A through 11C and 12A through 12C provide illustrations ofmodulating a perceptual coding system's masking threshold (modulation ofthe SNR offset in FIG. 11A and modulation of the fast gain code in FIG.12A), the resulting effect of the modulation when the coding system isbit-constrained (FIG. 11B and FIG. 12B, respectively), and the resultingeffect of the modulation when the coding system is not bit constrained(FIG. 11C and FIG. 12C, respectively). FIG. 11D identifies the legendsemployed in FIGS. 11A-11C and 12A-12C. Bit-constraints occur when thecoder is restricted to producing coded blocks having the same length,which is a requirement of many transmission channels. When the coder isable to vary the number of bits from block to block, there is noeffective constraint on the of bits used to represent the signal. Asshown (FIGS. 11B and 12B), in a bit-constrained coder, the decodedsignal's quantizer error does not exactly match the masking threshold atall frequencies; the example illustrates that more than the necessarybits exist (the gap between the threshold and the decoded signal),resulting in positive margin between the masking threshold and theoriginal quantizer error at some frequencies. Without bit constraints,the coder is able to exactly match the quantizer error to the maskingthreshold throughout the frequency band. For the default parametervalue, the intended watermark symbol may be a bit value of “0”. For themodulated parameter value, the intended symbol may be a bit value of “1”as in this example. FIGS. 11A and 12A show the masking threshold beforeand after modulation. FIGS. 11B, 11C, 12B and 12C show the resultingdecoded signal. The modulated masking threshold is overlaid in FIGS.11/12B and 11C/12C to provide a comparison with the modulated decodedsignal spectrum. FIG. 11D shows the legends employed in FIGS. 11A-C and12A-C.

Modulating Non-Masking Parameters

[0124]FIGS. 13 and 14 provide illustrations of the signalcharacteristics that result from modulating parameters other thanmasking parameters in Dolby coders. In each of the figures, the signalcharacteristic is illustrated using a default parameter value and amodulated parameter value. In FIG. 13, the effects of modulatingcoupling parameters are shown. For each block in time, which is denotedon the horizontal axis, there are illustrated two channels labeled leftand right. When the coupling in use flag is “0”, each channel is treatedindependently. When the coupling in use flag is “1”, the two channelsare combined into a single coupling channel above a certain frequency,denoted by the cplbegf parameter. In addition to the coupling in useflag, the coupling begin frequency may also be modulated, which is alsoshown in FIG. 13.

[0125] In FIG. 14, the effects of modulating the phase flag areillustrated. When the phase flag is equal to “0”, the phase is notmodified, but if the flag is equal to “1”, the phase of the signal isshifted by 180 degrees.

Modulating TDAC Window Parameters

[0126] As explained above, perceptual encoders reduce the data rate ofan input signal by removing perceptually redundant information. Thesesystems start by decomposing the input signal into one or morecomponents, and then use perceptual analysis to determine how muchaccuracy each of these components require in order for the differencebetween the source and coded material to be imperceptible (or to achievean acceptable level of perceptibility) after the quantized componentsare decoded. One example of such a system is a transform coder thatconverts temporal samples to a frequency-based representation using atime-domain aliasing cancellation (TDAC) transform. In order to assureperfect reconstruction, the time-domain samples are processed usingoverlapping windows prior to transformation. After the transform, thefrequency samples are then quantized and encoded in a way that reducesthe data rate and are perceptually insignificant upon decode. Tomaintain perfect reconstruction after the inverse transform process inthe decoder, the time-domain samples are windowed, overlapped and addedusing parameters matched to those that were used in the encoder.Generally, the window parameters for the encode and decode windows arechosen such that when they are applied during the forward and reverseTDAC transforms, aliasing is minimized or removed. Details regardingtransform coding using TDAC transforms are set forth in“Analysis/Synthesis Filter Bank Design Based on Time Domain AliasingCancellation” by Princen and Bradley IEEE Trans. on Acoustics, Speech,and Signal Processing, Vol. ASSP-34, No. 5, October 1986, pp. 1153-1161,and “Subband/Transform Coding Using Filter Bank Designs Based on TimeDomain Aliasing Cancellation” by Princen et al, Proceedings: ICASSP 87,1987 Intl. Conf. on Acoustics, Speech, and Signal Processing, April,1987, Dallas, Tex. pp. 2161-2164.

[0127] A watermark may be applied by modulating the parameters of atime-domain window used in the construction or reconstruction of thetransformed signal. For example, a mismatch between the slope, or alpha(α), of the time-domain windows used during encoding and decodingresults in time-domain aliasing when using critically sampledtransforms. This aliasing results in a unique noise or distortion inboth the time and frequency domains. Thus, the window parameter, eitherin the encoder or the decoder, may be modulated to convey a watermarkthat is detectable in the encoder output. Distortion, in this sense, isdefined as the difference between the coded and original signals, andmay or may not result in audible artifacts. In a preferred embodiment,the alpha (slope) values of the time-domain window are modulated. Byintroducing a noise or distortion signal that is imperceptible butrelated to and hidden by the source signal, it is extremely difficult toremove or obscure the resulting watermark without creating a perceptibleimpairment.

[0128] Another parameter of the time-domain window that may be changedin order to convey a watermark is the type of window itself. Forexample, a Kaiser-Bessel Defined window may be used to embed a watermarkbit of “0”, while a Hanning window may be used to embed a watermark bitof 1. The modulated window change may be done in the encoder or in thedecoder.

[0129] Additionally, in order to improve detectability and minimizeperceptibility, the window parameter may be modulated adaptively in timedepending on signal characteristics. For instance, transient signals mayobscure the watermark signal, therefore it is advantageous to be able todetect these signals and modulate the window so as to relocate theposition of the watermark signal to take advantage of psychoacoustictemporal masking effects. Furthermore, the strength of the modulationand, consequently, the strength of the watermark signal in the decodedsignal may be adaptively modified depending on the source signalcharacteristics. The amount that the window parameters mismatch directlyaffects the strength of the added distortion. Therefore, thepsychoacoustic masking characteristics of the input signal may beanalyzed and used to signal the watermark embedding process to vary theamount of the mismatch for a watermark symbol so that it is maximallymasked by the signal content.

[0130] The direct-form forward TDAC transform equation is given by:${{X(k)} = {{{- 2}/N}\quad {\sum\limits_{n = 0}^{N - 1}{{x(n)}{w(n)}{\cos \left( {\frac{2\pi}{N}\left( {k + {1/2}} \right)\left( {n + n_{0}} \right)} \right)}}}}},{0 \leq k < {N/2}}$

[0131] where

[0132] n=sample number

[0133] k=frequency bin number

[0134] x(n)=input PCM sequence

[0135] w(n)=window sequence

[0136] X(k)=output transform coefficient sequence

[0137] N=total number of samples in the transform

[0138] n0=half of the total number of samples in the transform

[0139] The TDAC transform window sequences using Kaiser-Bessel defined(KBD) windows can be defined by the following equations:${W_{KBD}\left( {n,\alpha,N} \right)} = \sqrt{\frac{\sum\limits_{p = 0}^{n}{W_{KB}\left( {p,\alpha,N} \right)}}{\sum\limits_{p = 0}^{N/2}{W_{KB}\left( {p,\alpha,N} \right)}}}$

[0140] where WKB is the Kaiser Bessel kernel window function, definedas:${W_{KB}\left( {p,\alpha,N} \right)} = \frac{I_{0}\left\lbrack {\pi \quad \alpha \quad \sqrt{1 - \left( \frac{p - {N/4}}{N/4} \right)^{2}}} \right\rbrack}{I_{0}\left( {\pi \quad \alpha} \right)}$

[0141] and I0 is the 0th order Bessel function, defined as:${I_{0}(x)} = {\sum\limits_{k = 0}^{\infty}\left\lbrack \frac{\left( {x/2} \right)^{k}}{k!} \right\rbrack^{2}}$

[0142]FIG. 15 illustrates five overlapping encoder windows of length256. The watermark is inserted in the encoding phase by using an α=4value for window number 5. It should be noted that windows 4 and 6 arehybrid windows that use a combination of α=3 and α=4 windows to providea smooth transition between the series of α=3 windows and the single α=4window. In the figure, the decoder windows implement α=3 windows for alltransforms. This mismatch in window types introduces time-domainaliasing artifacts in the resulting output signal. The amount oftime-domain aliasing introduced into the decoded audio increases as thedifference between the encoder α value (α=4) and decoder α value (a=3)increases and exists only in the section of the audio that was processedby encoder window number 5. This method of α alteration does not requiredecoders to be modified in order to convey watermarked signals and isuseful for watermarking at the source of distribution of the signal.

[0143]FIG. 16 again illustrates five overlapping windows of length 256,however, in this example, the α window value is altered during thedecoding process with inverse TDAC transform windows. Again, time-domainaliasing occurs, injecting a watermark signal into the decoded signal.However, in this example, the embedded signal is injected at thedecoder, allowing watermark information to be introduced for a specificend user or device. This α modification allows the decoder to embedserialized information to the signal data.

[0144] It may be beneficial to use shorter transform windows whenapplying the watermark since they reduce the duration of the aliasingdistortion and they are generally used during transient conditions (inaudio coding). The temporal masking characteristics for the transientsignals may be exploited to use values of alpha that more greatly differfrom the “correct” value and thereby produce a more robust watermark.

TDAC Window Modulation Detector

[0145] By modifying the value of alpha of the TDAC windows, atime-domain aliasing signal is introduced that is related to the codedsignal. This aliasing can be measured as the introduction of spectralnoise or distortion of the spectral components of the coded signal.

[0146] One possible detection method may compare the difference betweenthe source material and the watermarked data as in the manner of theFIG. 4 and FIG. 5 arrangements. This method would search the differencesignal for spectral distortion where the watermark modified window wasused. If the spectral distortion exceeded a threshold, this would beindicated as a ‘1’ symbol for the watermarked section of data. Spectraldistortion below a threshold would be detected as a ‘0’ symbol.

[0147] This method is sensitive to wide band noise that may beintroduced to mask the watermarked signal. Another detection method isto track spectral peaks of the watermarked signal and look for theamplitude modulation of the frequency bins both before and after thespectral peak that is introduced by time-domain aliasing in thewatermarking application. Similar to the general spectral distortionmethod described below, this detection method would compare thefrequency bins surrounding predominant spectral components to athreshold. However, this threshold would be related to the strength ofthe source signal's spectral component. Spectral side lobes below thethreshold would be interpreted as a ‘0’ symbol and spectral side lobesabove would be interpreted as a ‘1’ symbol.

Modulating TNS Filter Coefficients

[0148] Temporal noise shaping is a coding technology that can help toprevent pre-echo artifacts in perceptual audio coding; it is describedin “Enhancing the Performance of Perceptual Audio Coders by TemporalNoise Shaping (TNS)” by Jurgen Herre and James Johnston, 101st AES(Audio Engineering Society) Convention Preprint 4384, Nov. 8-11, 1996.Predictive coding in the frequency domain is used to shape thequantization noise in the time domain. The prediction can help tocontrol where the quantization noise is placed in the time domain. Inthe case of audio coding, the noise is constrained within the amplitudeenvelope of the time domain-masking signal to prevent pre-echo. In thecase of audio coding, the noise is constrained within the amplitudeenvelope of the time-domain masking signal to help prevent pre-echo.Pre-echo is an artifact that occurs during transient conditions when theapplied frequency transform does not have enough time resolution toprevent quantization noise from occurring before the transient in theoutput signal.

[0149] Although temporal noise shaping (TNS) is a feature of the MPEG-2AAC perceptual coding system, it may be applied to other systems, suchas Dolby Digital, thus providing a further way to modulate parameters insuch other systems.

[0150] In accordance with this aspect of the present invention, one ormore TNS filter parameters are modulated. In particular, the TNS noiseshaping filter order and TNS noise shaping filter shape may bemodulated, as explained further below.

[0151] The TNS process involves the steps of:

[0152] 1. Decomposing the signal into spectral coefficients by using atime-to-frequency transform,

[0153] 2. Applying a standard linear-predictor by forming a windowedautocorrelation matrix and using recursion, and

[0154] 3. If the prediction gain exceeds a certain threshold, anoise-shaping filter is applied to the spectral coefficients.

[0155] The invention relies on the properties of the noise-shapingfilter that is applied during TNS processing. The spectral-domain filtermay be modified in such a way as to shape the noise in any number ofdifferent temporal responses. By varying certain parameters of thistemporal envelope via spectral-domain filtering, a watermark may beembedded in the signal. In other words, one modulates the noise-shapingfilter in the spectral or frequency domain, which thereby changes thequantization noise in the time domain.

[0156] An exemplary temporal envelope response, plotting sound pressurelevel (SPL) versus time, is illustrated in FIG. 17.

[0157] The temporal masking model is quite similar to the spectralmasking model used in certain perceptual coders. In particular, thedownward and upward envelopes for spectral masking are analogous withthe backward and forward temporal masking envelopes. In order toidentify more specifically the TNS parameters that may be modulated inaccordance with an aspect of the present invention, it is useful toconsider in more detail a portion of the operation of the temporal noiseshaping process. After decomposing the signal into spectral coefficientsby using a time-to-frequency transform, a linear predictive coding (LPC)calculation is performed on the spectral data to determine if theprediction gain exceeds a certain threshold and to derive an envelope ofthe signal. The prediction coefficients are then computed for each TNSfilter for each block as:

h=Rxx−1 rxx

where

rxx T={Rxx(i,j)}; Rxx(i,j)=AutoCorr(|i−j|); i,j=1, 2, . . . , Nrxx′=rxx*win

[0158] where Rxx is the N-by-N autocorrelation square matrix, N is theTNS prediction order, and h is the vector-optimized predictioncoefficients. These equations are based on the well-known orthogonalityprinciple that states that the minimum prediction error is orthogonal toall data used in the prediction.

[0159] At initialization time, an autocorrelation matrix window iscomputed according to the equation: $\begin{matrix}{{{win}\left( {i = {0.{.31}}} \right)} = ^{{({ + \frac{1}{2}})}^{2} \cdot {guassExp}}} \\{where} \\{{gaussExp} = {{- \frac{1}{2}}\left( {\pi \cdot F_{SAMP} \cdot 0.001 \cdot \frac{timeResolution}{transformResolution}} \right)\quad {transformResolution}}}\end{matrix}$

[0160] where

[0161] FSAMP=signal sample rate

[0162] The timeResolution variable is dependent on the bit rate andnumber of channels. Likewise, the transform block length defines thetransformResolution variable.

[0163] The optimal order of the noise-shaping filter is determined byremoving reflection coefficients below a certain threshold from the endof the coefficient array. One parameter that may be modulated in orderto convey a watermark is the noise shaping filter order. For example, awatermark bit of one sense may be represented by the optimal filterorder and a watermark bit of the other sense may be represented by anon-optimal filter order (either lower or higher). Another parameterthat may be changed in order to convey a watermark is the shape of thenoise shaping filter itself. For example, a watermark bit of one sensemay be indicated by using the optimal coefficients determined by the LPCcalculation, while a watermark bit of another sense may be indicated bymodifying the coefficients, and thus the shape of the noise-shapingfilter.

[0164] By modulating the TNS parameters (filter order or filtercoefficients), noise is modulated in the temporal envelope of the inputsignal such that it may be detected in the decoded output signal. FIG.18 shows an example of a temporal masking envelope and the variabilitywith which the quantizer error may be modulated within that envelope.With each block in time, the TNS parameters may be modulated to convey awatermark.

[0165] Practical embodiments of the present invention can provide a veryrobust watermarking solution. Since the noise that is added by the TNSprocess is tightly coupled to the envelope of the source signal, it isvery difficult to remove or obscure the watermark without degrading theoriginal signal.

[0166] The transparency of the watermark described in this invention maybe controlled by using an adaptive distortion process of the typedescribed below. In this case, once the temporal envelope of the signalhas been modified using TNS, the results are iteratively compared witheither a temporal or spectral representation of the temporal maskingthreshold. If the threshold is exceeded, adjustments are made to thetemporal masking parameters and the process is repeated to ensure thedesired balance between robustness and perceptibility of the watermarkedsignal.

[0167] The temporal masking characteristics shown in FIG. 18 may beapplied to sub-bands of the signal. This allows layering of watermarksalong with potentially more locations to embed the watermark.

Modulating Bandwidth

[0168] It is known that reducing the bandwidth of an audio signal causesminimal degradation to the subjective quality as long as it remainsabove a minimum level of approximately 16 kHz. Experiments have alsoshown minimal degradation when the bandwidth is changed dynamically aslong as it remains above the minimum level. If the bandwidth ismodulated in accordance with a supplemental or watermark signal in theencoder or the decoder, that signal may be derived from the decodedaudio. For example, a one-bit code may be embedded in an audio signalwhere a bandwidth of 16 kHz represents a “0” symbol and a bandwidth of20 kHz represents a “1” symbol. This can be expanded to multiplebandwidths representing multi-bit symbols creating a higher embeddedsignal data rate. FIG. 19 illustrates a 2-bit symbol using fourdifferent bandwidths. This strategy can be applied where non-robust,inaudible watermarks are required. The inaudible criteria can beachieved as described above. This strategy is non-robust because thewatermark can easily be removed by low-pass filtering the decoded audiosignal.

[0169]FIG. 20 shows an example of an audio signal that contains anembedded signal using the bandwidth of the signal to represent thedifferent symbols.

[0170] One problem with the bandwidth watermarking technique describedabove is that it depends upon the existence of signal content above theminimum bandwidth. For much of the time, signal content above theminimum bandwidth does not exist. A constant embedded signal data ratecannot be attained without high frequency signal content. For example,if the audio signal content consists of a single sine wave at 1 kHz, theonly possible way to transmit embedded data in this signal would be toreduce the bandwidth to below 1 kHz. This would be clearly audible anddestroy the original signal.

[0171] A method that may provide a constant watermark-embedding rate isto ensure that the audio signal contains high frequency energy. One wayto achieve this is to add noise to the upper frequencies of the audiosignal in such a way that a listener does not perceive it. If the noiseadded is less than or equal to the human threshold of hearing, it is notperceptible. With the addition of this noise, the embedded signal canuse the audio bandwidth as a signaling mechanism that provides aconstant data rate. Note that this noise only needs to be added withinthe signaling band. This signaling band is defined as the band betweenthe lowest frequency and the highest frequency used to place thewatermark. The signaling band can be divided into smaller sections wheremore than two bandwidths are employed to create the watermark.

[0172]FIG. 21 illustrates the addition of the noise shaped to theapproximate level of the hearing threshold. It is added to a signal thatconsists of only a single sine wave and it is added only in thesignaling band. The addition of noise into the signaling band does nothave to be limited by the hearing threshold but it will probably beaudible if the energy were above it. Another dimension of signaling canbe added by adjusting the amplitude of the noise below the hearingthreshold. For example, additional data may be hidden or inserted if theenergy in a region of the signaling band contained more than just anenergy state and a no energy state by adding a half energy state. Thisamplitude signaling would increase the data rate of the embedded signal.

[0173] The signal is detectable as long as some signal content isensured just below the upper bandwidth. It is important that the addedsignal within the signaling band is similar in each channel. In manycases, these signals are mixed electrically or acoustically and it isimportant that they do not cancel each other. If in-phase sine waveswere added to multiple channels and used for signaling, they wouldcancel when added acoustically depending upon location. This reduces thereliability of the watermark. Using independent random noise is a bettersolution because it does not cancel when mixed.

[0174] Since signal content may occur in the signaling band and shapednoise is added in the signaling band to guarantee a constant embeddingrate, the two signals are added and occasionally increase the energy inthe signaling band. This energy variability makes the detection processmore difficult. In a preferred embodiment of this aspect of theinvention, a low-pass filter is applied to the source signal prior tothe addition of the shaped noise to eliminate any source signalinteraction in the signaling band.

[0175] In the Dolby Digital algorithm or coding process, even if thecontent in the upper frequency bands is determined to be insignificant,a coarse power spectrum is transmitted in the bitstream that can be usedin the decoder to add random noise shaped to the power spectrum. This isa feature of the decoder that is turned on when the dither flag in thebitstream is enabled. The added noise in the decoder recreates thewatermark in the decoded audio even if the encoder has judged itperceptually insignificant. The watermark may be inserted during eitherthe encoding or the decoding process.

[0176] A Dolby Digital audio coder is capable of generating changes inthe bandwidth in accordance with one of two bandwidth parameters (thechbwcod and cplendf codes listed above in the table of FIG. 21). Thiscreates an efficient way of implementing the watermark. However,modulating these codes to generate detectable changes in the decodedsignal does put some limitations on the embedded signal data rate:

[0177] 1. All channels should contain the same bandwidth so that downmixing the signal does not destroy the embedded data. This limits theembedded data rate to the equivalent of a mono channel.

[0178] 2. For optimal sound quality, the bandwidth code should only beset once per frame, which limits the embedded data rate to the symboldepth and encoded sample rate. If the bandwidth code were changed morethan once per frame, the overall sound quality of the coded audio wouldbe reduced.

[0179] 3. The number of available symbols is limited to the number ofavailable bandwidth codes above the minimum bandwidth.

[0180] For example, if the coder is using two different bandwidth statesto embed data at 48 kHz the embedded data rate is approximately 31.25bps. (31.25 frames per second, each containing one bit of information)If it is using four bandwidth states at 48 kHz, the date rate is 62.5bps. These numbers are derived from the fact that each Dolby Digitalframe contains 1536 unique audio sample. If another coder were used thatcontained 2048 unique audio samples per frame, the data rate would beapproximately 23.5 bps for a one-bit code.

[0181] The Dolby Digital coder sends an approximation of the powerspectral density in the encoder bitstream with each audio frame. It isupdated every time there is a significant change in the audio spectrum.The power spectral density information is sent as exponents that arelinearly spaced in frequency. In the Dolby Digital decoder, dither isadded to any portion of the spectrum that received no quantizedinformation because the signal information was not considered important.The dither, which is essentially random noise, is scaled to the level ofthe exponent. This adds signal energy to that portion of the spectrum.If the exponents in the signaling band are shaped to less than or equalto the hearing threshold, the dither guarantees signal energy.

[0182] The following steps outline the current method of assuring thatthere is energy in the signaling band within a Dolby Digital encodedsignal.

[0183] 1. Random noise is added above the minimum signaling bandwidththat is shaped to be at or below the hearing threshold. This causes theminimum energy to follow the shape of the hearing threshold.

[0184] 2. The exponents that are calculated after the noise additioncaptures this minimum energy level.

[0185] 3. The decoder recreates the spectral energy from the transmittedexponents even if no bits have been allocated above the minimumsignaling bandwidth because dither is usually added. This ensures signalcontent for the embedded signaling.

[0186] The two techniques described above (bandwidth variation anddither) can be used to integrate a low complexity;, fixed bit-ratewatermark into a Dolby Digital encoder or decoder. This system is robustagainst “normal use” of the encode/decode chain that includes downmixing, dynamic range control, volume normalization, matrix surrounddecoding etc.

[0187] Thus, an embodiment of the this aspect of the present inventionmay include the following steps:

[0188] 1. Adjusting the bandwidth to embed a hidden data signal.

[0189] 2. Using a bandwidth code of the Dolby Digital encoding/decodingsystem to adjust the bandwidth to embed a hidden data signal.

[0190] 3. Adding noise in the signaling band to ensure signal contentcan be used to embed data at a constant rate.

[0191] 4. Shaping this added noise to be less than or equal to the humanthreshold of hearing to prevent audible perception of the added noise.

[0192] 5. Adjusting the amplitude of this added noise to add anotherdimension of signaling to increase the data rate of the embedded signal.

[0193] 6. The integration of the shaped noise with a Dolby Digital coderto guarantee signal content within the signaling band.

[0194] The watermark detector interprets the embedded informationcontained within the reproduced audio signal. It is preferably capableof extracting the information both electrically and acoustically, butthis capability may not be necessary for all applications. Extractingthe watermark after acoustic processing is considered a more difficultchallenge because of the addition of room noise, speaker and microphonecharacteristics, and overall playback volume.

[0195] The goal of the detector is to determine if there is energywithin a given signaling band to find the bandwidth of the audio. Thisrequires a frequency decomposition of the audio that can be calculatedby a Fourier transformation, a group of bandpass filters that analyzethe signaling band, etc. The energy in each signaling band can beobtained from this signal decomposition. A detector can use this energyinformation to determine the embedded symbol.

[0196] One possible detection method applies a fixed thresholdcomparison in each signaling band to determine the encoded symbol. Thisthreshold may be set at the energy level just above the noise floor.Anything above this level would be considered to contain signal level.FIG. 22 shows three different energy levels required for detecting fourdifferent bandwidths that create a 2-bit symbol. Any energy above thedetection threshold is considered ‘high’ and anything below isconsidered ‘low’.

[0197] This fixed threshold only works well in a closed environmentwhere the noise floor of the system is always known and the peak signallevels are never attenuated. For example, if any other noise were addedto the noise floor in the above diagram, the third energy level would beconsidered ‘high’ and an incorrect symbol would be interpreted.

[0198] It is possible to use a fixed threshold if the energy levels areequalized or normalized before the threshold calculation. One techniquethat would accomplish this applies an AGC algorithm or process to thesignaling band before the energy levels are determined. These levels arenormalized by the AGC so that the ‘low’ and ‘high’ levels become moreconsistent. A fixed threshold can be applied in this case because of thenormalization of the levels.

[0199] An adaptive threshold is thought to be best for any environmentwhere the noise levels and the signal energy are constantly changing.One possible detection method that employs an adaptive threshold usesthe previous energy states to calculate a threshold for the currentstate. This detector works on the premise that in a finite number of theprevious states for a given energy band, there should exist some energylevels that are in a ‘high’ state and some that are in a ‘low’ state.The largest energies may be considered ‘high’ while the smallest may beconsidered ‘low’. These ‘high’ and ‘low’ states can be considered to betwo different groups. FIG. 23 contains several example histograms of thedistribution of ‘high’ and ‘low’ states. A threshold may be determinedthat lies somewhere between these two ‘clusters’.

[0200] If the number of ‘high’ states is assumed to be equal to thenumber of ‘low’ states in the previous finite set, the largest halfbelongs to the ‘high’ group while the smallest half belongs to the ‘low’group. If the average energy level of mean is found for each group, asimple threshold can be calculated as the average of these two means.This can easily become more complicated by assuming differentdistributions for the two groups and thresholds that take intoconsideration more of each groups statistics like mean and variance.

[0201] Another consideration may be included that improves theseparation into ‘high’ and ‘low’ groups. When more than two bandwidthsare included in the embedding process, the energy levels in thesignaling bands are dependent. When the highest bandwidth is ‘on’, allthe energy levels in each signaling band should be detected as ‘high’.When the second highest bandwidth is ‘on’, all the signaling levelsbelow this bandwidth should be detected as ‘high’. This alters thedistribution of the energy levels for each signaling band.

[0202] For example, assume that the watermark encoder is generating atwo-bit symbol using four different bandwidths. Let A, B, C and Drepresent the bandwidths where A is the lowest bandwidth and D is thehighest. Three different energy bands are required to determine thesebandwidths. Let these three energy bands be represented by 1, 2, and 3,which are the energy between bandwidths A-B, B-C and C-D respectively.The following table lists the probability for each energy bands to be ina ‘high’ state if the symbols are uniformly distributed. Energy BandP(‘high’) 1 ¾ 2 ½ 3 ¼

[0203] The probabilities are not equal because of the dependence of eachenergy band on the bandwidth. For example, the probability of signalcontent in energy band 1 is the sum of the probabilities of the B, C andD symbols of occurring. Each symbol has a probability of ¼ of occurring;hence, the probability of signal content in energy band 1 is ¾.

[0204] If the previous forty states were used to calculate the currentthreshold for each energy band, the highest thirty states would beassumed to represent signal content within energy band 1. The remainingten samples would represent no signal content. The current threshold forthis case is determined by finding the average of the means betweenthese two groups.

[0205] The addition of channel coding to ensure that the symboldistribution is substantially uniform is essential for this detector. Ifthe encoder entered a symbol that was just the highest bandwidth for anextended period, this detector would have difficulty decoding theembedded data. The closer the symbol distribution is to the assumedprobability, the more accurate the detection of the embedded data is.

[0206] One possible channel coding method is to ensure that each symboloccurs only once over a finite period. For example, if there are fourdifferent bandwidth codes, each symbol may be required to occur once ina group of four symbols. This generates 24 unique symbols that aregroups of four bandwidth codes. 24 (four factorial) is the maximumnumber of permutations of the four bandwidth codes. If A, B, C and Drepresent the four bandwidth codes, the symbols would look like ABCD,BACD, ABDC, BADC, BCAD, etc. Note that this reduces the embedded datarate.

[0207] Thus, a watermark detector according to this aspect of thepresent invention may include

[0208] 1. An embedded signal detector that uses an adaptive thresholdthat is calculated by examining previous states. The previous states areseparated into groups based on energy level. The threshold is based onstatistics of the each group that try to separate the groups as much aspossible.

[0209] 2. When multiple groups are involved, the number of elements inthe groups is adjusted based on dependencies from the bandwidthadjustment.

[0210] 3. A channel coder that ensures that the distribution of thesymbols is close to uniform over a finite time. This ensures that thewatermark detector described above functions properly.

Controlling Strength of the Parameter Modulation Adaptive DistortionControl

[0211] One goal of the present invention is to embed a watermark havingmaximized detectability and minimized perceptibility. Perceptualencoders use a threshold of perceptibility to determine how to reducethe redundancy of an input signal. This same threshold can be used toadjust the watermark signal in a way that is detectable while remainingsubstantially imperceptible.

[0212] As mentioned above, in Some perceptual encoders, a distortionmeasurement is paired with the rate control to ensure that the currentinformation is discarded. A distortion measurement compares the originalinput signal with the encoded signal (output of the rate control). Thedistortion measure may be useful to control some of the codingparameters to change the outcome of the rate control process. This maycreate a nested loop structure, described below, in which the outer loopcontains a distortion measure and the inner loop is the rate control.Modifications are made iteratively to the coding parameters by examiningthe distortion measurement until some criteria are met. The sameapproach may be applied to variable data rate encoders, by removing therate loop.

[0213] The process for embedding a watermark using a threshold ofperceptibility according to an aspect of the present invention is shownin FIGS. 24-26. This process is similar to that defined in the MPEG-2AAC perceptual coder in which two nested loops are used to determine theoptimal quantization. The inner iteration loop, shown in FIG. 24,modifies the quantizer step size until the spectral data can be codedwith the number of available bits (rate control). The outer iterationloop, shown in FIG. 25, amplifies the spectral coefficients in allspectral bands in a way that the demands of the psychoacoustic model arefulfilled as much as possible (distortion control). The process of FIG.25 is modified by modulating a perceptual coding parameter or parameters(shown in FIG. 26) to fulfill the psychoacoustic model, or perceptualthreshold, as much as possible while also embedding a watermark signal.All of the parameters listed in the tables of FIGS. 6, 7 and 8 may bemodulated in this way, although, some parameters are more difficult thanothers to change during the bit allocation process.

[0214] The rate control process in FIG. 24 attempts to represent thesignal by a smaller fixed amount of information. The input signal isquantized according to the perceptual threshold (step 20) and the bitsused as a result of the quantization are counted (step 22). If thenumber of used bits does not exceed the available bits, then the processis finished (step 24). Alternatively, the iterative process continuesuntil the number of bits used matches as closely as possible the numberof available bits. This is usually accomplished by adjusting theperceptual threshold, via quantizer step size modifications, untilenough information has been discarded (step 26).

[0215] A distortion measuring process, shown in FIG. 25, may be added tothe quantizer step size process to ensure that some of thesimplifications of the rate control encoding process have not causederrors that are easily perceived. The distortion measure allowsfine-tuning of coding parameters to minimize such errors. In the firststep of the process, the rate loop, or inner loop, is performed toquantize the input signal according to a rate constraint (step 28). Thena distortion evaluation calculates how much distortion exists (step 30)and determines whether the distortion is acceptable relative to aperceptual threshold (step 32). If the distortion is not acceptable, thespectral coefficients are amplified (step 34) and the process isrepeated. If the distortion is acceptable, the result of thequantization is applied to the input signal (step 36) and the process iscompleted. “Distortion”, in this sense, is the difference between thecoded and original signals, and may or may not result in audibleartifacts.

[0216] In aspects of the present invention, a distortion measureprocess, shown in FIG. 26, is used to determine the amount that a codingparameter value may be varied from its default value when modulated andyet stay within the bounds of the perceptual threshold. This maximizesthe possible detection of the watermark because it preferably causes asmuch distortion as possible, constrained by the perceptual threshold,without the distortion being perceptible. The rate control (step 28),distortion control (step 32), and coding parameter adjustment (step 38)steps are repeated until an acceptable compromise is made.

[0217] Certain coding systems, such as Dolby Digital, use a rate controlprocess during encoding but do not apply distortion control. Therefore,in order for such coding systems to employ this aspect the invention, adistortion measure is added. Other coders, such as MPEG-2 AAC, alreadyhave the distortion control process integrated for the purposes ofcoding and with minor modifications may be used also to apply awatermark according to this aspect of the present invention. It shouldbe noted that in variable-rate coding systems, the rate loop is notrequired, thus providing an optimal solution to the parameter modulationprocess while also reducing complexity.

[0218]FIG. 27 illustrates how a watermark may be embedded according tothe present invention using a distortion measuring process of the typejust described. Preferably, The goal is to maximize robustness byforcing the effect of the modulated parameter, which is illustrated asthe change in quantizer error in pass 2, as close to the perceptualthreshold as possible. In the first pass, the perceptual threshold iscalculated. In the second pass, the quantizer error is shown. Note thatthere is some margin available with which to modify the quantizer errorimperceptibly. In pass 3, the chosen watermark coding parameter, in thisexample the delta bit allocation type of parameter (i.e., the deltba orcpldeltba parameters, which affect the quantizer error within a criticalband), has been adjusted and results in a modified quantizer error. Thequantizer error may be modified even further and still remainimperceptible. Note that the modulation of the coding parameter resultsin a slightly different quantization error over the entire spectrumbecause the number of bits available is affected. This illustrates thatmodulation of coding parameters, and resulting quantizer resolution incertain bands, causes error in the entire spectrum, not only the band inwhich the parameter is modulated. In pass 4, the degree of modulation ofthe coding parameter has been adjusted again using information from pass3 and the resulting quantizer error is as close as possible to theperceptual threshold. Although it is preferred to bring the quantizererror as close as possible to, but below, the perceptual threshold, whenmodulating one or more parameters that affect quantizer error, theinvention also contemplates the modulation of one or more parameterssuch that the quantizer error is below but not close to the perceptualthreshold, as for example in pass 3 of FIG. 27.

[0219]FIG. 28 illustrates the watermark embedding process wherein thechosen watermark coding parameter is the overall SNR offset type ofparameter (i.e., the csnroffst, fsnroffst, cplfsnroffst or lfesfsnroffstparameters). Note that in this example, modulation of the overall SNRoffset parameter results in an exact match to the perceptual threshold.This is because the SNR offset type of parameter is a uniform offset ofthe perceptual threshold throughout the frequency spectrum. Accordingly,the process of adapting the quantizer error to the perceptual thresholdusing the SNR offset type of parameter requires only one step.

[0220] A further facet of this aspect of the present invention allows auser to control the offset of the perceptual threshold that controls thepossible ‘gain’ or energy of the watermark. This may be a linear offsetto the perceptual threshold or a more complicated function that allowsmore distortion in specific bands. This allows a user to control theease of detection and the audibility of the final embedded signal. Thismay be accomplished by raising the perceptual threshold curve by a fixedamount. Furthermore, by modifying the perceptual threshold, the user mayembed a watermark where the watermark coding margin is negative.

[0221] In perceptual coders, such as Dolby Digital, Dolby E, and MPEG-2AAC coders, the quantization, or bit allocation, process is computedbased on the number of bits available to the coder and the overallsignal-to-noise ratio. Next, the perceptual threshold is compared to thequantizer error. If the distortion (difference between perceptualthreshold and quantizer error) does not meet the completionrequirements, the chosen coding parameter modulation is modified basedon the distortion and the process is repeated until the distortion isacceptable.

[0222] In a preferred embodiment of this aspect of the invention, thedistortion is computed from groups of banded coefficients (i.e., groupedby critical bands) that form the basis of the perceptual threshold. Itshould be noted that the perceptual threshold might also be based on thequantization error of individual spectral coefficients at the sacrificeof increased complexity.

[0223] Once the threshold is established, the distortion control portionof this aspect of the invention begins. The coding parameter under testis modulated in accordance with subsequent iterations of the distortionprocess. The modulation of the encoding parameter affects the result ofthe bit-allocation of the spectral bands performed in the rate controlprocess. The resulting threshold of the bit allocation is compared withthe original perceptual threshold and the coding parameter is modulatediteratively until the completion requirements are met. If therequirements for completion are not met, the masking threshold isreformulated using the modulated parameter.

[0224] In a preferred embodiment of this aspect of the invention, thetermination of the adaptive distortion process may occur when theperceptual threshold and the masking threshold are equivalent for anygiven band of interest and none of the bands of the masking thresholdexceed the perceptual threshold. If the perceptual and maskingthresholds never converge, further termination logic may be employed aslong as the masking threshold does not exceed the perceptual threshold.Termination requirements exist in order to constrain complexity.

Decoder Parameter Modulation

[0225]FIG. 29 shows an aspect of the present invention in which theparameters of a perceptual audio decoder are modulated. In this example,the decoder employs a hybrid bit allocation (i.e., a perceptual model isconveyed from the encoder to the decoder). The received perceptuallycoded bitstream 40 is separated in the decoder into coding parameters 42(representing the bit allocation model) and reformatted data 44 (i.e.,the quantized data). Bit allocation 46 and inverse quantization 48 areperformed. In the next step 50, a decision is made (Perceptual ThresholdCalculated?). If not computed already (i.e., the first time through theloop), a perceptual threshold is calculated (step 52) based on thesignal from the coded bitstream. If the perceptual threshold exists(i.e., after the first time through the loop), a comparison is made(step 54) between the inverse quantized signal and the threshold. Next,a decision is made (Acceptable Distortion?) in step 56. If the resultingdistortion is acceptable (i.e., meets predefined terminationrequirements), then the process is complete and spectral coefficientsare outputted to other functions in the decoder. If the distortion isnot acceptable, the coding parameter being modulated is adjusted (step58) and the process of bit allocation, inverse quantization, andperceptual threshold comparison are repeated. The coding parameter isinitially modulated based on the watermark symbol (i.e., supplementalinformation) input 60 and is subsequently adjusted based on theperceptual threshold comparison.

[0226] A similar process may be employed in a perceptual audio decodersystem employing a forward-adaptive bit allocation (i.e., a perceptualmodel is created in the encoder and explicitly sent to the decoder). Thesignal data is reformatted using the transmitted perceptual model. Thisperceptual model can then be modified by a parameter to embed awatermark. The watermarked version of the audio is compared to theunmarked signal. If the distortion measurement does not meet thespecified, predefined completion requirement(s); the signal isreformulated using a modified parameter modulation value.

Controlling Parameter Modulation in Response to a Watermark Sequenceand/or a Deterministic Sequence

[0227] In other aspects of the invention, modulation of one or moreparameters is controlled indirectly by the supplemental information orwatermark signal or sequence. For example, control of the modulation bythe watermark is modified by a function of one or more other signals ordata sequences including, for example, a set of instructions such as adeterministic sequence and/or the input signal applied to the codingsystem. FIG. 30 is a functional block diagram showing this aspect of theinvention. As in the basic arrangement of FIG. 2, primary information isapplied to a perceptual encoder function 2 that generates a digitalbitstream that is received by a perceptual decoder function 4. In thisaspect of the invention, the supplemental information is applied to aparameter controller function 62. The parameter controller function 62also receives the primary information or one or more deterministicsequences or both the primary information and one or more deterministicsequences. The parameter controller 62 modifies the way in whichsecondary information modulates encoder function or decoder functionparameters. It does so by modifying one or more sets of secondaryinformation each with either a function of the primary informationand/or a function of one or more deterministic sequences as nextdescribed. Because modified supplemental information from the parametercontroller function may be applied either to the encoder function or tothe decoder function or to both, dashed lines are shown from thesupplemental information to the encoder function and to the decoderfunction, respectively. As in the case of the FIG. 2 arrangement, theoutput of the perceptual decoder function is primary information withembedded supplemental information. The supplemental information may bedetectable in the decoder function output,

[0228] If modified supplemental information controls parametermodulation in both the encoder function 2 and the decoder function 4,typically, the information applied to one will be different from thatapplied to the other. For example, the supplemental informationcontrolling the one or more encoder function parameters might representa watermark identifying the owner of the audio or video content and thesupplemental information controlling the one or more decoder functionparameters might be a serial number identifying the equipment thatpresents the audio or video content to one or more consumers.

[0229] When the parameter controller 62 employs a deterministic sequenceto modify the manner in which the supplemental information modulates oneor more parameters, detection of the supplemental information orwatermark in the decoder function output requires the generator equationand the key of the deterministic sequence to be known by the detectorfunction. The generator equation may be known publicly, may be known apriori by the detector (but not publicly), or may be communicated to thedetector via a secure channel. Similarly, the key may be known publicly,may be known a priori by the detector (but not publicly), or may becommunicated to the detector via a secure channel. For the system to besecure, the only requirement is that the key not be publicly disclosed.

[0230] When the parameter controller 62 employs the input signal tomodify the manner in which the supplemental information modulates one ormore parameters, detection of the supplemental information or watermarkin the decoder function output requires the source signal or at leastcertain information about the source signal (e.g., the characteristicsof the source signal that the parameter controller is programmed torespond to) to be known by the detector function. This may be done bycommunicating the source signal or, preferably, the characteristics ofthe source signal that the parameter controller is programmed to respondto, to the detector function. If the source signal, rather than therelevant characteristics of the source signal that the parametercontroller is programmed to respond to, to the detector relevantcharacteristics independently based on an analysis of the source signaland the decoder function output. However, errors may occur because thecharacteristics are originally determined based on the source signalwith no quantizer error.

Controlling Parameter Modulation in Response to a Deterministic SequenceModifying the Rate of Watermark Symbol Transitions

[0231] One variation of this aspect of the present invention involvescontrolling, with a deterministic sequence, the rate of parametermodulation state transitions, and, consequently, the rate of watermarksymbol transitions. In particular, it involves varying, in response tothe deterministic sequence, the duration of the parameter modulationstates and, consequently, the duration of the watermark symbol rates. Ifwatermark symbols transitions are embedded at a constant rate,repetitive sequences in the watermark symbol pattern may be perceptible.By modifying the duration of the parameter modulation states and,consequently, the duration of the symbol, repetitive effects may beminimized. Table 1 shows an example in which the duration of theparameter modulation state and, consequently, the duration of thewatermark symbol, is dependent on a deterministic sequence, thusresulting in the pattern shown as the modified sequence. In thisparticular example the watermark sequence is repeated if thedeterministic sequence value is equal to “1”. If the DS has a value of“0”, the watermark symbol is not repeated. It should be noted that theperiod of the watermark symbol pattern increases based on theoccurrences of the value of “1” in the deterministic sequence.Accordingly, a finite sequence should be used that resets appropriatelyso that synchronization is possible during detection. TABLE 1 SequenceType Sequence Deterministic sequence 10110010 (DS) Watermark sequence01011100 (WS) Modified sequence 001001111000

Selecting the Parameter for Embedding the Watermark

[0232] In accordance with a further variation of this aspect of theinvention a deterministic sequence selects the parameter or parametersused to embed the watermark. Generally, it is possible to employ any oneof several parameters to embed a watermark. For example, the modulationof one parameter may result in a spectral energy modification in aparticular frequency range and the modulation of another parameterresults in a reduction in bandwidth of the decoded signal. If only oneparameter is modulated, the resulting watermark may be more perceptibleto a person with acute sensitivity to spectral energy modulation. On theother hand, if the embedding technique that is used switches betweenmodulating one parameter and modulating another, the resulting watermarkmay be less perceptible. As the number of watermark embedding parametersincreases, this effect becomes more pronounced (the impairmentintroduced by the watermark is more noise-like).

[0233] Table 2 illustrates two ways in which coding parameters may beselected for modulation. In the first example, shown in part “a” ofTable 2, parameters 1 and 2 take on the value of the watermark sequence(WS) depending on the deterministic sequence (DS). For example,parameter 1 is modulated to a state reflecting the WS value if the DSvalue is “0”, otherwise it is modulated to a state reflecting a “0”value (either state may be, but need not be, the parameter's defaultvalue). Accordingly, parameter 2 is modulated to a state reflecting theWS value if the DS value is 1, otherwise it is modulated to a statereflecting a “0” value (either state may be, but need not be, theparameter's default value). The sequences from both parameters and fromthe DS are required to detect the WS in this example. In the secondexample, shown in part “b” of Table 2, parameters 1 and 2 are modulatedto a state reflecting the value of the WS depending only on the WSitself. For example, parameter 1 is modulated from its default state toa state reflecting a WS value of “0” and parameter 2 is modulated fromits default state to a state reflecting a WS value of “1”. In this way,either parameter may be detected independently, as they both convey theWS. TABLE 2 Sequence Type Sequence Deterministic sequence 10110010 (DS)Watermark sequence 01011100 (WS) a Parameter 1 = WS, DS 01001100 (0)Parameter 2 = WS, DS 00010000 (1) b Parameter 1 = 1, WS (0) 10100011Parameter 2 = 1, WS (1) 01011100

Modifying the Rate at Which the Choice of Parameters for ModulationChanges

[0234] According to a further variation of this aspect of the invention,the choice of parameters for modulation may change depending on adeterministic sequence. This may further reduce perceptibility of thewatermark, as periodic effects introduced by changing the embeddingtechnique at a constant rate are eliminated. This embodiment isillustrated in Table 3. In this example, parameter 1 is modulated to astate reflecting the inverse of the WS (either state may be, but neednot be, the parameter's default value) and the symbol repeats when theDS value is “1” and otherwise it is not repeated. Parameter 2 ismodulated to a state reflecting the default value of the WS (eitherstate may be, but need not be, the parameter's default value) and thesymbol repeats when the DS value is “1” and otherwise it is notrepeated. As in the example of part b of Table 2, both parameters conveythe watermark. TABLE 3 Sequence Type Sequence Deterministic sequence10110010 (DS) Watermark sequence 01011100 (WS) Modified sequence (rateParameter 1 = 11011000011 of technique transitions) (WS), 0 DS (0)Parameter 2 = WS, 00100111100 DS (1) 1

Controlling Parameter Modulation in Response to the Characteristics ofthe Source Signal Modifying the Rate of Watermark Symbol TransitionsUsing Source Signal Analysis

[0235] Another variation of this aspect of the invention involvesanalyzing the characteristics of the source signal, and then adaptivelycontrolling the rate of parameter modulation transitions and,consequently the rate of watermark symbol transitions based on theresults of this analysis. In particular, it involves varying, inresponse to characteristics of the source signal, the duration of theparameter modulation states and, consequently, the duration of thewatermark symbol states. For example, rapidly changing signal conditionsmay provide a useful degree of temporal masking that may be used tolessen the perceptibility of a watermark symbol transition. If theamplitude of the time-domain source signal varies beyond apre-determined threshold from frame 1 to frame 2 (assuming that thesource signal has been formatted into a digital signal stream havingframes), the watermark symbol may be allowed to change from one value inframe 1 to another value in frame 2. In frame 3, if the characteristicof the source signal does not vary beyond the threshold from theprevious frame(s), the symbol may not be permitted to change values. Bycorrelating watermark symbol transitions to masking events or other“change-friendly” conditions in the underlying source signal,imperceptibility of the watermark may be improved.

[0236] In Table 4, a source-defined sequence (SDS) represents the outputof a thresholding process, such as transient detection. For thisexample, an SDS value of “0” indicates that no transient conditionoccurred and a value of “1” indicates that a transient was present inthe block. In part “a” of Table 4, the WS value is repeated (byrepeating the same modulation state of the parameter) if the SDS has avalue of “1”. If the SDS has a value of “0”, the watermark symbol is notrepeated. In this example, it is assumed that a single coding parameterconveys the watermark.

Modifying the Rate at Which the Choice of Parameters for ModulationChanges Using Source Signal Analysis

[0237] In another aspect of the invention, the just-explained aspect ismodified so as to use the characteristics of the source signal to modifythe rate at which the choice of parameters for modulation changes, asopposed to the rate of parameter modulation. As in the just-explainedaspect, the benefit is that the transitions are less perceptible if theyoccur when the source signal provides temporal masking or other“change-friendly” conditions. An example of this embodiment isillustrated in part b of Table 4. In this example, parameter 1 ismodulated to a state reflecting the inverse of the WS (either state maybe, but need not be, the parameter's default value) and the symbolrepeats when the SDS value is “1” and otherwise it is not repeated.Parameter 2 is modulated to a state reflecting the default value of theWS (either state may be, but need not be, the parameter's default value)and the symbol repeats when the SDS value is “1” and otherwise it is notrepeated. As in the example of part b of Table 2, both parameters conveythe watermark. This approach is similar to that shown in Table 3, butdiffers only in that the transition rate is here defined by the SDS.TABLE 4 Sequence Type Sequence Signal-defined sequence 00101110 (SDS)Watermark sequence 01011100 (WS) a Modified sequence (rate Parameter 101001111100 of symbol transitions) 0 b Modified sequence (rate Parameter1 =- 10110000011 of technique transitions) (WS), 1 SDS (0) Parameter 2 =WS, 01001111100 SDS (1) 0

Selecting the Parameter for Embedding the Watermark Using Source SignalAnalysis

[0238] In another aspect of the present invention, the number ofparameters in the available set of parameters available for modulationis modified based on characteristics of the source signal. Suppose aparticular watermarking system is able to embed a watermark bymodulating any of several different parameters (e.g., parametersresulting in spectral energy boost, temporal noise insertion, bandwidthreduction, etc.). Depending on the current characteristics of the sourcesignal, not all of these parameters may cause imperceptible changes inthe decoded signal. For example, if the source signal is stationary,temporal noise insertion may be more perceptible than a spectral energyboost in a frequency range that is perceptually masked. As a result, itmay be beneficial to reduce the available set of parameters to disallowthose that are likely to cause results that are more perceptible for thecurrent signal characteristic.

[0239] In Table 5, an example shows a signal-defined sequence (SDS)based on the same thresholding process (transient detection) asdescribed previously. An SDS value of “1” indicates that a transientcondition exists in the block and an SDS value of “0” indicates that notransient condition exists. In Table 5, parameters 1 and 2 nominallyconvey the watermark when no transient condition exists (SDS=0), withparameter 1 having a modulation state reflecting a value of “1” for WMvalues of “0” and having a modulation state reflecting a value of “0”otherwise and parameter 2 having a modulation state reflecting a valueof “1” for WM values of “1” and having a modulation state reflecting avalue of “0” otherwise. If a transient condition exists (SDS=1), thenparameters 3 and 4 are modulated, which parameters optimally causetemporal distortion, instead of parameters 1 and 2, which cause spectraldistortion. Having reduced the number of parameters, a deterministicsequence may then be used to select parameters from the smaller set,thereby preserving the benefit of switching between or among parameters,while at the same time adaptively choosing among parameters that arepreferable in view of current source signal characteristics. TABLE 5Sequence Type Sequence Signal-defined sequence (SDS) 00101110 Watermarksequence (WS) 01011100 Parameter 1 = 1, WS (0), SDS 10000001 (0)Parameter 2 = 1, WS (1), SDS 01010000 (0) Parameter 3 = 1, WS (0), SDS00100010 (1) Parameter 4 = 1, WS (1), SDS 00001100 (1)

Controlling Parameter Modulation in Response to a Deterministic Sequenceand the Characteristics of the Source Signal

[0240] In addition to controlling parameter modulation using only adeterministic sequence or only characteristics of the input signal, theinvention also contemplates controlling parameter modulation in responseto both a deterministic sequence and characteristics of the inputsignal.

[0241] There are multiple ways to combine the use of a deterministicsequence and the source signal characteristics in order to controlparameter modulation. Doing so may further improve imperceptibilityand/or robustness. In one such method, a deterministic sequence selectswhich subset of coding parameters is used for different states of thesignal characteristics. More particularly, using the example of Table 5above, the first two parameters are chosen for modulation when atransient does not exist (SDS=0) and those parameters are chosen basedon a deterministic sequence, DS. Table 6 illustrates this method. TABLE6 Sequence Type Sequence Signal-defined sequence (SDS) 00101110Deterministic sequence (DS) 10110010 Watermark sequence (WS) 01011100Parameter 1 = 1, SDS (0), DS (0), 00000001 WS (0) Parameter 2 = 1, SDS(0), DS (0), 01000000 WS (1) Parameter 3 = 1, SDS (0), DS (1), 10000000WS (0) Parameter 4 = 1, SDS (0), DS (1), 00010000 WS (1) Parameter 5 =1, SDS (1), DS (0), 00000000 WS (0) Parameter 6 = 1, SDS (1), DS (0),00001100 WS (1) Parameter 7 = 1, SDS (1), DS (1), 00100010 WS (0)Parameter 8 = 1, SDS (1), DS (1), 00000000 WS (1)

[0242] In another example, the deterministic sequence modifies the rateof transitions of the watermark sequences that are modified by asignal-defined sequence. Table 7 illustrates this method. The secondcolumn illustrates the first step of altering the embedding techniquebased on the SDS and the third column illustrates the second step offurther altering the rate of the sequences based on the DS. As inprevious examples, the sequence value is repeated if the SDS has a valueof “1”. If the SDS has a value of “0”, the sequence value is notrepeated. TABLE 7 Sequence Sequence Type Sequence (DS) (DS/SS)Signal-defined sequence (SDS) 00101110 Deterministic sequence (DS)10110010 Watermark sequence (WS) 01011100 Parameter 1 = 1, WS (0), SDS10000001 110000000001 (0) Parameter 2 = 1, WS (1), SDS 01010000001001100000 (0) Parameter 3 = 1, WS (0), SDS 00100010 000110000110 (1)Parameter 4 = 1, WS (1), SDS 00001100 000000011000 (1)

[0243] With each of the examples in which multiple coding parametersconvey the embedded sequence, there also exists the possibility ofadding redundancy by applying the same watermarking sequence to multiplecoding parameters to increase error resiliency to attack or processing.To facilitate lower-complexity detection, such coding parameters mayhave constrained relationships, or a predetermined hierarchy, such thatif one parameter has errors the detector may be able to recover themessage from another coding parameter.

[0244] Additionally, a deterministic sequence may be used to modulatesimultaneously one or more other coding parameters to make it difficultfor an attacker to deduce which parameter is carrying the watermark. Inan example shown in Table 8, parameter 1 conveys the watermark sequenceand the deterministic sequence specifies which of parameter 2 orparameter 3 will vary based on the watermark sequence. Parameters 2 and3 in this case do not carry the watermark, but act as decoys. In thisexample, the decoy parameters will equal the WS for the appropriatestate of the DS, and will be “0” otherwise. TABLE 8 Sequence TypeSequence Deterministic sequence (DS) 10110010 Watermark sequence (WS)01011100 Parameter 1 = WS 01011100 Parameter 2 = WS, DS (0) 01001100Parameter 3 = WS, DS (1) 00010000

CONCLUSION

[0245] It should be understood that implementation of other variationsand modifications of the invention and its various aspects will beapparent to those skilled in the art, and that the invention is notlimited by these specific embodiments described. It is thereforecontemplated to cover by the present invention any and allmodifications, variations, or equivalents that fall within the truespirit and scope of the basic underlying principles disclosed andclaimed herein.

[0246] The present invention and its various aspects may be implementedas software functions performed in digital signal processors, programmedgeneral-purpose digital computers, and/or special purpose digitalcomputers. Interfaces between analog and digital signal streams may beperformed in appropriate hardware and/or as functions in software and/orfirmware.

1. A method of modifying the operation of the encoder function and/orthe decoder function of a perceptual coding system in accordance withsupplemental information so that the supplemental information may bedetectable in the output of the decoder function, comprising modulatingone or more parameters in said encoder function and/or said decoderfunction in response to the information content of said supplementalinformation.
 2. A method according to claim 1 wherein said perceptualcoder is an audio coder of the type that employs a hybridforward/backward bit allocation.
 3. A method according to claim 2wherein said one or more parameters include one or more parameters thatfall within one or more of the following categories: masking model andbit allocation, coupling between or among channels, frequency bandwidth,dither control, phase relationship, and time/frequency transform window.4. A method according to claim 1 wherein said perceptual coder is anaudio coder of the type that employs a forward bit allocation.
 5. Amethod according to claim 4 wherein said one or more parameters includeone or more parameters that fall within one or more of the followingcategories: masking model and bit allocation, coupling between or amongchannels, temporal noise shaping filter coefficients, and time/frequencytransform window.
 6. A method according to claim 1 wherein saidperceptual coder is a video coder and wherein said one or moreparameters include one or more parameters that fall within one or moreof the following categories: frame type, and motion control.
 7. A methodaccording to claim 1 wherein said one or more parameters are selectedfrom the parameters that affect in the decoded output signal one or moreof: signal-to-noise ratio, quantizer noise, time relationship between oramong channels, frequency bandwidth, shaped noise, phase relationshipbetween or among channels, and wide spectrum, time-aliasing noise.
 8. Amethod according to claim 1 wherein said one or more parameters aremodulated by performing one of the following acts: varying a two valuedparameter between its two values, varying the parameter between or amongits default value and one or more other values, and varying theparameter between or among values other than its default value.
 9. Amethod according to claim 1 wherein the degree of modulation of said oneor more parameters is controlled.
 10. A method according to claim 9wherein the degree of modulation of said one or more parameters iscontrolled to limit the perceptibility of artifacts in the decodedoutput signal resulting from the modulation of said one or moreparameters.
 11. A method according to claim 1 wherein the modulation ofa parameter is indirectly controlled in accordance with supplementalinformation such that one or more of the following modulationcharacteristics: the selection of one or more parameters for modulation,the rate of parameter selection, and the rate of parameter statetransitions is determined in response to supplemental information and asa function of one or more other signals or sequences.
 12. A methodaccording to claim 11 wherein said one or more other signals orsequences includes either or both of the following: a set ofinstructions, and characteristics of the input signal to the encoder ofthe coding system.
 13. A method according to claim 12 wherein said setof instructions include a deterministic sequence.
 14. A method accordingto claim 13 wherein said deterministic sequence is apseudo-random-number sequence.
 15. A method according to claim 1 whereinsaid one or more parameters are modulated in said encoder function. 16.A method according to claim 1 wherein said one or more parameters aremodulated in said decoder function.
 17. A method according to claim 1wherein said one or more parameters are modulated in said encoderfunction and in said decoder function.
 18. A method for modifying theoperation of the encoder and/or the decoder of a perceptual codingsystem in, accordance with supplemental information and for detectingthe supplemental information in the output of the decoder according toclaim 1, further comprising detecting the supplemental information inthe output of the decoder function.
 19. A method according to claim 18wherein the act of detecting the supplemental information in the outputof the decoder function is accomplished by one of the following acts:observing the decoded signal, comparing the decoded signal to the signalapplied to the encoder function, and comparing the decoded signal to thedecoded signal from a substantially identical perceptual coding systemin which no parameters in the encoder function or decoder function aremodulated in response to supplemental information.
 20. A methodaccording to claim 19 wherein the act of observing the decoded signalcomprises comparing the decoded signal to a time delayed version ofitself.
 21. A method according to claim 1 wherein one or more perceptualcoding parameters are modulated in response to said supplementalinformation.